-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk, backupccl: introduce a Structured event Aggregator #84043
Conversation
d19ce3a
to
34e7ce7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @rhu713, and @stevendanna)
pkg/util/bulk/aggregator.go
line 64 at r1 (raw file):
b.mu.sp.SetLazyTag(eventTag, b.mu.aggregatedEvents[eventTag]) } else { b.mu.aggregatedEvents[eventTag].Combine(bulkEvent)
does this change an event that's part of the structured recording, or does it change a copy that's only used for the tag?
pkg/util/bulk/aggregator.go
line 83 at r1 (raw file):
// The Aggregator instance is responsible for finishing the returned span, and // so the user must call Close(). func MakeAggregatorWithSpan(
I don't need to care about this code, so feel free to ignore, but I for one would not intertwine the aggregator and the span. I'd have callers create a span explicitly, then attach an aggregator to the span. The aggregator would not be in charge of closing the span.
pkg/util/tracing/crdbspan.go
line 512 at r1 (raw file):
for _, span := range childRecording { for _, record := range span.StructuredRecords { s.notifyEventListeners(record.Payload)
remind me why this had to change pls
34e7ce7
to
6b36270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @dt, @rhu713, and @stevendanna)
pkg/util/bulk/aggregator.go
line 64 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
does this change an event that's part of the structured recording, or does it change a copy that's only used for the tag?
good catch, added a Clone method to the interface that returns a copy the first time we assign it a slot in the map.
pkg/util/bulk/aggregator.go
line 83 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't need to care about this code, so feel free to ignore, but I for one would not intertwine the aggregator and the span. I'd have callers create a span explicitly, then attach an aggregator to the span. The aggregator would not be in charge of closing the span.
I did try this before the current implementation, but it wasn't very pretty. Since event listeners have to be registered at span creation time, it involved plumbing the event listener through a few methods in processorbase starting from bp.StartInternal(ctx, backupProcessorName)
.
pkg/util/tracing/crdbspan.go
line 512 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
remind me why this had to change pls
https://github.com/cockroachdb/cockroach/pull/84043/files#diff-d5042f9b360d17b5e8009b0cedd5b8a7c490b64b4b41c6958b4e5fd99074ad57R47 in the aggregator we type check that the structured event is an AggregatorEvent
. When importing remote spans we notify the event listeners on the span being imported into of these events. Before this diff we would pass along record.Payload
which is of type *types.Any
. Casting record.Payload
to the AggregatorEvent
interface type will always fail, since it is the underlying protoutil.Message
that implements AggregatorEvent
. So we first unmarshal the types.Any into a types.DynamicAny which would give us access to the concrete proto type.
@@ -930,6 +935,9 @@ func (s *crdbSpan) getRecordingNoChildrenLocked( | |||
childKey := string(tag.Key) | |||
childValue := tag.Value.Emit() | |||
|
|||
if rs.Tags == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, didn't you fix this already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I rebased on master a couple days ago. When we chatted it was me working on this diff its just been sitting in cold storage since. TFTR! I'll wait on Steven to give it a pass since I bugged him to before you stamped it, and then merge away.
Approved for backupccl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I left some comments, but I leave them to your discretion, I don't think they are blocking.
It might be nice to include in the already good commit message a line or two about how I might see these aggregated events -- just so anyone who sees this new feature knows how to start experimenting with it.
if err := types.UnmarshalAny(record.Payload, &d); err != nil { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it doesn't matter too much, but I kinda wonder if we can hold off on Unmarshaling this until we know there are event listeners in place for this span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd either have to change the notifyEventListeners
signature to take a *types.Any
or inside notifyEventListeners
we will have to MarshalAny
the passed initem Structured
before we UnmarshalAny it into a DynamicAny. I think I'm going to leave it as is for now since this code is only called when importing remote spans into our span, not when every structured event is emitted.
6b36270
to
bae4969
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM. I just have a slight design input that may improve things down the line.
a507887
to
d1bb4b8
Compare
d1bb4b8
to
fc36b3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @benbardin, @dt, @rhu713, and @stevendanna)
pkg/util/bulk/aggregator.go
line 23 at r4 (raw file):
// Aggregator. An AggregatorEvent also implements the tracing.LazyTag interface // to render its information on the associated tracing span. type AggregatorEvent interface {
given that there is only one type of event implementing this, is the interface worth it?
pkg/util/bulk/aggregator.go
line 37 at r4 (raw file):
// An Aggregator can be used to aggregate and render AggregatorEvents that are // emitted as part of its tracing spans' recording. type Aggregator struct {
consider naming this TracingEventAggregator
, or something more specific than simply Aggregator
.
pkg/util/bulk/aggregator.go
line 61 at r4 (raw file):
// If this is the first AggregatorEvent with this tag, set it as a LazyTag on // the associated tracing span. This way the AggregatorEvent will be
nit: scratch the second sentence. Every user of a tag is not expected to explain what the point of tags are.
pkg/util/bulk/aggregator.go
line 96 at r4 (raw file):
sp := tracing.SpanFromContext(ctx) aggCtx, aggSpan := sp.Tracer().StartSpanCtx(ctx, aggregatorName,
use EnsureChildSpan
, and then you don't need to deal with the parent.
pkg/util/bulk/aggregator.go
line 97 at r4 (raw file):
aggCtx, aggSpan := sp.Tracer().StartSpanCtx(ctx, aggregatorName, tracing.WithEventListeners([]tracing.EventListener{agg}), tracing.WithParent(sp))
let's make WithEventListener
a variadic function (take ...EventListener
)
This change introduces an Aggregator object that is capable of listening for Structured events emitted in a recording, aggregating them and rendering them as LazyTags. We also introduce an AggregatorEvent interface that can be implemented by a Structured event thereby making it eligible for aggregation in the Aggregator. The first user of the Aggregator will be every backup data processor that is setup on the nodes in the cluster during a backup. The Aggregator lives as long as the processor, and listens for Aggregator events emitted by any span that is a child of the processors' span. This includes both local children as well as remote children whose recordings have been imported into a local span. The Aggregator stores running aggregates of each AggregatorEvent it is notified about, bucketed by the events' tag name. This aggregate will be rendered on the tracing span as a LazyTag. This change teaches every ExportRequest to emit an AggregatorEvent. Going forward we expect many more operations in bulk jobs to define and emit such events providing visibility into otherwise opaque operations. We cleanup some of the StructuredEvents that were previously added but have not proved useful, and also change some of the tracing span operation names to be more intuitive. To view these aggregated events once can navigate to the `/tracez` endpoint of the debug console to take a snapshot and search for either `BACKUP` or the job ID to filter for tracing spans on that node. The span associated with the backup processor will be decorated with tags that correspond to the fields in the introduced `ExportStats` proto message. Note these stats are only aggregated on a per node basis. Fixes: cockroachdb#80388 Release note: None
fc36b3c
to
ae1804b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @benbardin, @dt, @rhu713, and @stevendanna)
pkg/ccl/backupccl/backuppb/backup.go
line 176 at r2 (raw file):
Previously, stevendanna (Steven Danna) wrote…
We could log the type of other here.
Done.
pkg/util/bulk/aggregator.go
line 23 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
given that there is only one type of event implementing this, is the interface worth it?
I'm bullish that there'll be more very soon. Especially in import/restore to surface AddSSTable, BulkAdder stats etc.
pkg/util/bulk/aggregator.go
line 37 at r4 (raw file):
TracingEventAggregator
done.
pkg/util/bulk/aggregator.go
line 96 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
use
EnsureChildSpan
, and then you don't need to deal with the parent.
Done.
pkg/util/bulk/aggregator.go
line 97 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's make
WithEventListener
a variadic function (take...EventListener
)
Done.
pkg/util/bulk/aggregator_test.go
line 60 at r2 (raw file):
Previously, stevendanna (Steven Danna) wrote…
// have not imported the remote child's Recording.
or, I suppose,
// have not imported the remote children's Recordings.
Done.
TFTRs! bors r+ |
Build failed (retrying...): |
Build succeeded: |
The "sending ExportRequest for span" log message was the 5th most voluminous log event source in CC. This commit makes it less verbose, by importing a one-line change from another merged PR cockroachdb#84043 from v22.2. Release note: None
The "sending ExportRequest for span" log message was the 5th most voluminous log event source in CC. This commit makes it less verbose, by importing a one-line change from another merged PR cockroachdb#84043 from v22.2. Release note: None
The "sending ExportRequest for span" log message was the 5th most voluminous log event source in CC. This commit makes it less verbose, by importing a one-line change from another merged PR cockroachdb#84043 from v22.2. Release note: None
This change introduces an Aggregator object that is capable
of listening for Structured events emitted in a recording,
aggregating them and rendering them as LazyTags.
We also introduce an AggregatorEvent interface that can be
implemented by a Structured event thereby making it eligible
for aggregation in the Aggregator.
The first user of the Aggregator will be every backup data
processor that is setup on the nodes in the cluster during
a backup. The Aggregator lives as long as the processor, and
listens for Aggregator events emitted by any span that is a child
of the processors' span. This includes both local children as
well as remote children whose recordings have been imported into
a local span. The Aggregator stores running aggregates of each
AggregatorEvent it is notified about, bucketed by the events'
tag name. This aggregate will be rendered on the tracing span as a LazyTag.
This change teaches every ExportRequest to emit an AggregatorEvent.
Going forward we expect many more operations in bulk jobs to
define and emit such events providing visibility into otherwise
opaque operations.
We cleanup some of the StructuredEvents that were previously
added but have not proved useful, and also change some of the
tracing span operation names to be more intuitive.
To view these aggregated events once can navigate to the
/tracez
endpoint of the debug console to take a snapshot and search for eitherBACKUP
or the job ID to filter for tracing spans on that node.The span associated with the backup processor will be decorated with
tags that correspond to the fields in the introduced
ExportStats
proto message. Note these stats are only aggregated on a per node
basis.
Fixes: #80388
Release note: None
Release justification: low risk change for improved observability into backups.